23 research outputs found

    The paradoxical role of emotional intensity in the perception of vocal affect

    Get PDF
    Vocalizations including laughter, cries, moans, or screams constitute a potent source of information about the affective states of others. It is typically conjectured that the higher the intensity of the expressed emotion, the better the classification of affective information. However, attempts to map the relation between affective intensity and inferred meaning are controversial. Based on a newly developed stimulus database of carefully validated non-speech expressions ranging across the entire intensity spectrum from low to peak, we show that the intuition is false. Based on three experiments (N = 90), we demonstrate that intensity in fact has a paradoxical role. Participants were asked to rate and classify the authenticity, intensity and emotion, as well as valence and arousal of the wide range of vocalizations. Listeners are clearly able to infer expressed intensity and arousal; in contrast, and surprisingly, emotion category and valence have a perceptual sweet spot: moderate and strong emotions are clearly categorized, but peak emotions are maximally ambiguous. This finding, which converges with related observations from visual experiments, raises interesting theoretical challenges for the emotion communication literature

    Perception of Nigerian DĂčndĂșn talking drum performances as speech-like vs. music-like: The role of familiarity and acoustic cues

    Get PDF
    It seems trivial to identify sound sequences as music or speech, particularly when the sequences come from different sound sources, such as an orchestra and a human voice. Can we also easily distinguish these categories when the sequence comes from the same sound source? On the basis of which acoustic features? We investigated these questions by examining listeners’ classification of sound sequences performed by an instrument intertwining both speech and music: the dĂčndĂșn talking drum. The dĂčndĂșn is commonly used in south-west Nigeria as a musical instrument but is also perfectly fit for linguistic usage in what has been described as speech surrogates in Africa. One hundred seven participants from diverse geographical locations (15 different mother tongues represented) took part in an online experiment. Fifty-one participants reported being familiar with the dĂčndĂșn talking drum, 55% of those being speakers of YorĂčbĂĄ. During the experiment, participants listened to 30 dĂčndĂșn samples of about 7s long, performed either as music or YorĂčbĂĄ speech surrogate (n = 15 each) by a professional musician, and were asked to classify each sample as music or speech-like. The classification task revealed the ability of the listeners to identify the samples as intended by the performer, particularly when they were familiar with the dĂčndĂșn, though even unfamiliar participants performed above chance. A logistic regression predicting participants’ classification of the samples from several acoustic features confirmed the perceptual relevance of intensity, pitch, timbre, and timing measures and their interaction with listener familiarity. In all, this study provides empirical evidence supporting the discriminating role of acoustic features and the modulatory role of familiarity in teasing apart speech and music

    The DĂčndĂșn Drum helps us understand how we process speech and music

    Get PDF
    Every day, you hear many sounds in your environment, like speech, music, animal calls, or passing cars. How do you tease apart these unique categories of sounds? We aimed to understand more about how people distinguish speech and music by using an instrument that can both “speak” and play music: the dĂčndĂșn talking drum. We were interested in whether people could tell if the sound produced by the drum was speech or music. People who were familiar with the dĂčndĂșn were good at the task, but so were those who had never heard the dĂčndĂșn, suggesting that there are general characteristics of sound that define speech and music categories. We observed that music is faster, more regular, and more variable in volume than “speech.” This research helps us understand the interesting instrument that is dĂčndĂșn and provides insights about how humans distinguish two important types of sound: speech and music

    Exploring emotional prototypes in a high dimensional TTS latent space

    Get PDF
    Recent TTS systems are able to generate prosodically varied and realistic speech. However, it is unclear how this prosodic variation contributes to the perception of speakers’ emotional states. Here we use the recent psychological paradigm ‘Gibbs Sampling with People’ to search the prosodic latent space in a trained Global Style Token Tacotron model to explore prototypes of emotional prosody. Participants are recruited online and collectively manipulate the latent space of the generative speech model in a sequentially adaptive way so that the stimulus presented to one group of participants is determined by the response of the previous groups. We demonstrate that (1) particular regions of the model’s latent space are reliably associated with particular emotions, (2) the resulting emotional prototypes are well-recognized by a separate group of human raters, and (3) these emotional prototypes can be effectively transferred to new sentences. Collectively, these experiments demonstrate a novel approach to the understanding of emotional speech by providing a tool to explore the relation between the latent space of generative models and human semantics

    Gibbs sampling with people

    Get PDF
    A core problem in cognitive science and machine learning is to understand how humans derive semantic representations from perceptual objects, such as color from an apple, pleasantness from a musical chord, or seriousness from a face. Markov Chain Monte Carlo with People (MCMCP) is a prominent method for studying such representations, in which participants are presented with binary choice trials constructed such that the decisions follow a Markov Chain Monte Carlo acceptance rule. However, while MCMCP has strong asymptotic properties, its binary choice paradigm generates relatively little information per trial, and its local proposal function makes it slow to explore the parameter space and find the modes of the distribution. Here we therefore generalize MCMCP to a continuous-sampling paradigm, where in each iteration the participant uses a slider to continuously manipulate a single stimulus dimension to optimize a given criterion such as 'pleasantness'. We formulate both methods from a utility-theory perspective, and show that the new method can be interpreted as 'Gibbs Sampling with People' (GSP). Further, we introduce an aggregation parameter to the transition step, and show that this parameter can be manipulated to flexibly shift between Gibbs sampling and deterministic optimization. In an initial study, we show GSP clearly outperforming MCMCP; we then show that GSP provides novel and interpretable results in three other domains, namely musical chords, vocal emotions, and faces. We validate these results through large-scale perceptual rating experiments. The final experiments use GSP to navigate the latent space of a state-of-the-art image synthesis network (StyleGAN), a promising approach for applying GSP to high-dimensional perceptual spaces. We conclude by discussing future cognitive applications and ethical implications

    Gibbs sampling with people

    Get PDF
    A core problem in cognitive science and machine learning is to understand how humans derive semantic representations from perceptual objects, such as color from an apple, pleasantness from a musical chord, or seriousness from a face. Markov Chain Monte Carlo with People (MCMCP) is a prominent method for studying such representations, in which participants are presented with binary choice trials constructed such that the decisions follow a Markov Chain Monte Carlo acceptance rule. However, while MCMCP has strong asymptotic properties, its binary choice paradigm generates relatively little information per trial, and its local proposal function makes it slow to explore the parameter space and find the modes of the distribution. Here we therefore generalize MCMCP to a continuous-sampling paradigm, where in each iteration the participant uses a slider to continuously manipulate a single stimulus dimension to optimize a given criterion such as 'pleasantness'. We formulate both methods from a utility-theory perspective, and show that the new method can be interpreted as 'Gibbs Sampling with People' (GSP). Further, we introduce an aggregation parameter to the transition step, and show that this parameter can be manipulated to flexibly shift between Gibbs sampling and deterministic optimization. In an initial study, we show GSP clearly outperforming MCMCP; we then show that GSP provides novel and interpretable results in three other domains, namely musical chords, vocal emotions, and faces. We validate these results through large-scale perceptual rating experiments. The final experiments use GSP to navigate the latent space of a state-of-the-art image synthesis network (StyleGAN), a promising approach for applying GSP to high-dimensional perceptual spaces. We conclude by discussing future cognitive applications and ethical implications

    The Timbre Perception Test (TPT): A new interactive musical assessment tool to measure timbre perception ability

    Get PDF
    To date, tests that measure individual differences in the ability to perceive musical timbre are scarce in the published literature.The lack of such tool limits research on how timbre, a primary attribute of sound, is perceived and processed among individuals.The current paper describes the development of the Timbre Perception Test (TPT), in which participants use a slider to reproduce heard auditory stimuli that vary along three important dimensions of timbre: envelope, spectral flux, and spectral centroid. With a sample of 95 participants, the TPT was calibrated and validated against measures of related abilities and examined for its reliability. The results indicate that a short-version (8 minutes) of the TPT has good explanatory support from a factor analysis model, acceptable internal reliability (α=.69,ωt = .70), good test–retest reliability (r= .79) and substantial correlations with self-reported general musical sophistication (ρ= .63) and pitch discrimination (ρ= .56), as well as somewhat lower correlations with duration discrimination (ρ= .27), and musical instrument discrimination abilities (ρ= .33). Overall, the TPT represents a robust tool to measure an individual’s timbre perception ability. Furthermore, the use of sliders to perform a reproductive task has shown to be an effective approach in threshold testing. The current version of the TPT is openly available for research purposes

    Evaluation tools in singing education: A comparison of human and technological measures

    No full text
    Two methods are currently used to determine vocal performance intonation: human listeners and computer tools. Theoretically, evaluation relies on musical criteria and the more that performers sing in accordance to these criteria, the more accurate they are judged to be. In practice, evaluating music performances has to take into account several elements such as the type of performance, the musician, and the judge performing the evaluation. By discussing the advantages and limits of the two current methods in the general framework of evaluation of intonation, this chapter aims to provide practical information to make informed decisions when evaluating singing performances
    corecore